Dallas Cowboys Offensive Formation Analysis
¶

Week 1 vs Giants
¶

By Peter Bradicich, September 14th, 2023

image.png

In Week 1, the Dallas Cowboys visited the New York Giants, an NFC East division rival, at MetLife Stadium to open the 2023 season. This Analysis focused on the offense formations the Cowboys ran against the Giants, how successful they were, and where on the field they ran the formations.


Part 1: Import Necessary Modules¶

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.patheffects as path_effects
from scipy.stats import iqr

# home-built scripts
from import_nfl_pbp import PlayByPlay as Play
from football_field_plot import create_football_field as Field

Part 2: Load & Inspect Week 1 Data¶

In [2]:
full_df = Play().retrieve_year(2023)
2023 done.
In [3]:
full_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2816 entries, 0 to 2815
Columns: 384 entries, play_id to n_defense
dtypes: float64(202), int32(8), int64(1), object(173)
memory usage: 8.2+ MB

This tells us that there are 2649 rows and 384 columns in the data. That is a lot of columns!! In the next step we will filter this data to what we need do study the Dallas offense formations.

Part 3: Filter Data by Dallas Offense¶

In [4]:
dallas_df = full_df[(full_df.possession_team == 'DAL') &
                    (full_df.week == 1)][[
    
                    'game_seconds_remaining', # how many seconds are left in the game
                    'week',                   # the week being played
                    'possession_team',        # the team in possession of the ball
                    'offense_formation',      # the formation of the offense
                    'play_type',              # what type of play (pass, throw, punt, etc)
                    'yards_gained',           # how many yards were gained on the play
                    'yrdln'                   # the starting yardline of the play
    
]].reset_index(drop=True)


Now that we have a good dataset to work with let us go ahead and inspect the first few lines


In [5]:
dallas_df.head()
Out[5]:
game_seconds_remaining week possession_team offense_formation play_type yards_gained yrdln
0 3600.0 1 DAL None kickoff 0.0 DAL 35
1 3183.0 1 DAL None extra_point 0.0 NYG 15
2 3183.0 1 DAL None kickoff 0.0 DAL 35
3 3126.0 1 DAL SHOTGUN pass 2.0 DAL 26
4 3097.0 1 DAL SINGLEBACK run 4.0 DAL 28



What was the frequency of each formation type?


In [6]:
dallas_df.offense_formation.value_counts()
Out[6]:
SINGLEBACK    24
SHOTGUN       22
I_FORM         5
EMPTY          4
JUMBO          2
PISTOL         1
Name: offense_formation, dtype: int64

Part 4: Visualize the Distribution¶

In [7]:
sns.displot(dallas_df, x="yards_gained", hue="offense_formation", element="step", multiple="stack")
plt.title("Offense Formation Frequency", fontsize=24, loc='center')
plt.ylabel("Count", fontsize=16)
plt.xlabel("Offense Formation", fontsize=16)
Out[7]:
Text(0.5, 9.444444444444438, 'Offense Formation')
In [8]:
sns.boxplot(data=dallas_df, x="yards_gained", y="offense_formation",  hue="offense_formation", dodge=False)
plt.title("Yards Gained by Formation", fontsize=24)
plt.ylabel("Offense Formation", fontsize=16)
plt.xlabel("Yards Gained", fontsize=16)
Out[8]:
Text(0.5, 0, 'Yards Gained')


An interesting point is that the data is skew-right with outliers in both the single back and shotgun formations. The outliers were good for Dallas! Below is an image for each of the formations for an easier mental picture of what the formations look like on TV.

image-2.png

Part 5: Which Formations Were Successfull?¶


There are a lot of factors that could go into providing answers to this question, however, to keep this analysis brief I will rank the formations by how many yards they produced. The metric I used is Interquartile Range (IQR) which is the length of the box on the boxplot above or the 75th - 25th percentile. The reason - there are several outliers that make the average yards gained a poor measure of dispersion.

In [9]:
for formation in dallas_df.offense_formation.unique():
    if formation is not None:
        formation_filter = dallas_df[dallas_df.offense_formation == formation]['yards_gained']
        IQR = np.round(np.percentile(formation_filter, 75) - np.percentile(formation_filter, 25), 1)
        average = np.round(np.mean(formation_filter), 1)
        median = np.round(np.median(formation_filter), 1)
        print(f"{formation} has an IQR of {IQR}, an Average of {average}, and a Median of {median} yards")
        
SHOTGUN has an IQR of 6.0, an Average of 4.9, and a Median of 2.0 yards
SINGLEBACK has an IQR of 4.8, an Average of 5.2, and a Median of 3.0 yards
I_FORM has an IQR of 2.0, an Average of 1.6, and a Median of 2.0 yards
EMPTY has an IQR of 3.0, an Average of 5.5, and a Median of 6.5 yards
JUMBO has an IQR of 0.5, an Average of 0.5, and a Median of 0.5 yards
PISTOL has an IQR of 0.0, an Average of 0.0, and a Median of 0.0 yards

The Shotgun Formation was the most successfull! (6.0 yards IQR)
Followed by the Single Back Formation (4.8 yards IQR)

There may be a reason why these two happened to be the top two most frequent formations...

In [10]:
dallas_df.offense_formation.value_counts()
Out[10]:
SINGLEBACK    24
SHOTGUN       22
I_FORM         5
EMPTY          4
JUMBO          2
PISTOL         1
Name: offense_formation, dtype: int64

Part 6: Plot the Yardline Locations of the Formations¶

First the yardlines need to be extracted from the yrdln column and the data needs to be saved off for each formation. I chose to use a dictionary where the keys are the names of the formations and the values are a filtered and parsed Pandas DataFrame

In [11]:
def clean_formation_data():
    
    cleaned_data = dict()
    
    for formation in dallas_df.offense_formation.unique():
        if formation is not None:
            # There is a None type that we don't want to mess with

            formation_filter = dallas_df[dallas_df.offense_formation == formation][[
                'offense_formation',
                'yards_gained',
                'yrdln'
            ]].reset_index(drop=True)

            # Save the team name to a new column
            formation_filter['team_name'] = [item.split(' ')[0] for item in formation_filter.yrdln]
            # Save the yard line values to a new integer column
            formation_filter['yard_line'] = [item.split(' ')[-1] for item in formation_filter.yrdln]
            formation_filter.yard_line = formation_filter.yard_line.astype(int)
            # Adjust the yard line for plotting purposes. The giants are on the left side so they only
            # Need 10 yards to compensate for the endzone. The Cowboys are on the right side so they
            # Need their values subtracted from 110 yards
            formation_filter['adjusted_yard_line'] = [
                (formation_filter.yard_line[x] + 10) if (formation_filter.team_name[x]=='NYG') else
                (110 - formation_filter.yard_line[x]) for x in range(len(formation_filter.team_name))]
            
            # Update the dictionary with the filtered and parsed DataFrame
            cleaned_data.update({formation:formation_filter})
            
    return cleaned_data


Verify that the returned data is the expected result

In [12]:
df = clean_formation_data()
df.get('SHOTGUN').head(2)
Out[12]:
offense_formation yards_gained yrdln team_name yard_line adjusted_yard_line
0 SHOTGUN 2.0 DAL 26 DAL 26 84
1 SHOTGUN 49.0 DAL 32 DAL 32 78


The data looks as expected and the adjusted yardline for plotting looks correct!

In [13]:
fig, ax = Field()
text = ax.text(112, 13, 'Cowboys', color='white', fontsize=36, rotation=270)
text.set_path_effects([path_effects.Stroke(linewidth=3,
                                           foreground='black'),
                                           path_effects.Normal()])

text = ax.text(3, 18, 'GIANTS', color='white', fontsize=30, rotation=90)
text.set_path_effects([path_effects.Stroke(linewidth=3,
                                           foreground='black'),
                                           path_effects.Normal()])
 
scatter1 = plt.scatter(df.get('SINGLEBACK').adjusted_yard_line, np.repeat(10, len(df.get('SINGLEBACK').adjusted_yard_line)),
                       linewidths=1, 
                       label='SINGLEBACK',
                       s=200,
                       marker='<'
)
scatter2 = plt.scatter(df.get('SHOTGUN').adjusted_yard_line, np.repeat(16.7, len(df.get('SHOTGUN').adjusted_yard_line)),
                       linewidths=1,
                       label='SHOTGUN',
                       s=200,
                       marker='<'
)
scatter3 = plt.scatter(df.get('I_FORM').adjusted_yard_line, np.repeat(23.3, len(df.get('I_FORM').adjusted_yard_line)),
                       linewidths=1,
                       label='I_FORM',
                       s=200,
                       marker='<'
)
scatter4 = plt.scatter(df.get('EMPTY').adjusted_yard_line, np.repeat(30, len(df.get('EMPTY').adjusted_yard_line)),
                       linewidths=1, 
                       label='EMPTY',
                       s=200,
                       marker='<'
)
scatter5 = plt.scatter(df.get('JUMBO').adjusted_yard_line, np.repeat(36.6, len(df.get('JUMBO').adjusted_yard_line)),
                       linewidths=1, 
                       label='JUMBO',
                       s=200,
                       marker='<'
)
scatter6 = plt.scatter(df.get('PISTOL').adjusted_yard_line, np.repeat(43.3, len(df.get('PISTOL').adjusted_yard_line)),
                       linewidths=1, 
                       label='PISTOL',
                       s=200,
                       marker='<'
)
ax.legend(handles=[scatter6, scatter5, scatter4, scatter3, scatter2, scatter1],
          loc=1, framealpha=0.95,
          bbox_to_anchor=(0.925, 0.933),
          labelspacing=1.0
)
plt.title("Cowboys Offense Formations", fontsize=30)
plt.show()

This plot shows the yardline position of each formation

Note: the vertical spacing does not indicate the play occured on that part of the field. They are seperated vertically to avoid overlap

Part 7: Takeaways¶

This brief study shows that Dallas Cowboys ran six offense formations with the

  • Shotgun and Singleback being the most common
  • Shotgun formation being the most successful (6.0 yards expected 50% of the time when in Shotgun formation)
  • I-Formation mainly executed within the opponents 25 yard line
  • Empty formation fills the gaps of the Shotgun formation. Goal may have been to mix up a play with a similiar formation